Compact Rule Extraction for Hierarchical Phrase-based Translation
نویسندگان
چکیده
This paper introduces two novel approaches for extracting compact grammars for hierarchical phrase-based translation. The first is a combinatorial optimization approach and the second is a Bayesian model over Hiero grammars using Variational Bayes for inference. In contrast to the conventional Hiero (Chiang, 2007) rule extraction algorithm , our methods extract compact models reducing model size by 17.8% to 57.6% without impacting translation quality across several language pairs. The Bayesian model is particularly effective for resource-poor languages with evidence from Korean-English translation. To the best of our knowledge, this is the first alternative to Hiero-style rule extraction that finds a more compact synchronous grammar without hurting translation performance.
منابع مشابه
Kriya - An end-to-end Hierarchical Phrase-based MT System
This paper describes Kriya – a new statistical machine translation (SMT) system that uses hierarchical phrases, whichwere first introduced in the Hieromachine translation system (Chiang, 2007). Kriya supports both a grammar extraction module for synchronous context-free grammars (SCFGs) and a CKY-based decoder. There are several re-implementations of Hiero in the machine translation community, ...
متن کاملA constrained hierarchical rule extraction method based on phrase collocations and high-frequency backbone words
Hierarchical-phrase based machine translation model is a popular translation model which combines advantages of phrase-based translation models and syntax-based translation models. However, since there are no linguistic constraints in the procedure of current hierarchical phrase extraction, there are a large number of redundant generalized rules extracted. In this paper, we propose two strategi...
متن کاملNTT System Description for the WMT2006 Shared Task
We present two translation systems experimented for the shared-task of “Workshop on Statistical Machine Translation,” a phrase-based model and a hierarchical phrase-based model. The former uses a phrasal unit for translation, whereas the latter is conceptualized as a synchronousCFG in which phrases are hierarchically combined using non-terminals. Experiments showed that the hierarchical phraseb...
متن کاملScalable Variational Inference for Extracting Hierarchical Phrase-based Translation Rules
We present a Variational-Bayes model for learning rules for the Hierarchical phrasebased model directly from the phrasal alignments. Our model is an alternative to heuristic rule extraction in hierarchical phrase-based translation (Chiang, 2007), which uniformly distributes the probability mass to the extracted rules locally. In contrast, in our approach the probability assigned to a rule is gl...
متن کاملExtraction Programs: A Unified Approach to Translation Rule Extraction
We provide a general algorithmic schema for translation rule extraction and show that several popular extraction methods (including phrase pair extraction, hierarchical phrase pair extraction, and GHKM extraction) can be viewed as specific instances of this schema. This work is primarily intended as a survey of the dominant extraction paradigms, in which we make explicit the close relationship ...
متن کامل